Introducing the FMM Procedure for Finite Mixture Models
نویسنده
چکیده
You’ve collected the data and performed a preliminary analysis with a linear regression. But the residuals have several modes, and transformations don’t help. You need a different approach, and that calls for the FMM procedure. PROC FMM fits finite mixture models, which enable you to describe your data with mixtures of different distributions so you can account for underlying heterogeneity and address overdispersion. PROC FMM offers a wide selection of continuous and discrete distributions, and it provides automated model selection to help you choose the number of components. Bayesian techniques are also available for many analyses. This paper provides an overview of the capabilities of the FMM procedure and illustrates them with applications drawn from a variety of fields. INTRODUCTION Most statistical methods assume that you have a sample of observations, all of which come from the same distribution, and that you are interested in modeling that one distribution. If you actually have data from more than one distribution with no information to identify which observation goes with which distribution, standard models won’t help you. However, finite mixture models might come to the rescue. They use a mixture of parametric distributions to model data, estimating both the parameters for the separate distributions and the probabilities of component membership for each observation. Finite mixture models provide a flexible framework for analyzing a variety of data. Suppose your objective is to describe the distribution of a response variable. If the corresponding data are multimodal, skewed, heavy-tailed, or exhibit kurtosis, they may not be representative of most known distributions. In this case, you often use a nonparametric method such as kernel density estimation to describe the distribution. A kernel density estimate generates a smoothed, numerical approximation to the unknown distribution function and estimates the distribution’s percentiles. Although this approach is useful, it might not be the most concise way to describe an unknown distribution. A finite mixture model provides a parametric alternative that describes the unknown distribution in terms of mixtures of known distributions. A finite mixture model also enables you to assess the probabilities of events or simulate draws from the unknown distribution the same way you do when your data are from a known distribution. Finite mixture models also provide a parametric modeling approach to one-dimensional cluster analysis. This approach uses the fitted component distributions and the estimated mixing probabilities to compute a posterior probability of component membership. An observation is assigned membership to the component with the maximum posterior probability. A benefit of using a model-based approach to clustering is that it permits estimation and hypothesis testing within the framework of standard statistical theory (McLachlan and Basford 1988). Finally, finite mixture models provide a mechanism that can account for unobserved heterogeneity in the data. Certain important classifications of the data (such as region, age group, or gender) are not always measured. These latent classification variables can introduce underdispersion, overdispersion, or heteroscedasticity in a traditional model. Finite mixture models overcome these problems through their more flexible form. FINITE MIXTURE MODELS Consider a data set that is composed of people’s body weights. Figure 1 presents the pooled data. The histogram indicates an asymmetric distribution with three modes. Figure 2 displays separate histograms for age group and gender. Each distribution is symmetric, with only one mode. 1 Statistics and Data Analysis SAS Global Forum 2012
منابع مشابه
The Negative Binomial Distribution Efficiency in Finite Mixture of Semi-parametric Generalized Linear Models
Introduction Selection the appropriate statistical model for the response variable is one of the most important problem in the finite mixture of generalized linear models. One of the distributions which it has a problem in a finite mixture of semi-parametric generalized statistical models, is the Poisson distribution. In this paper, to overcome over dispersion and computational burden, finite ...
متن کاملThe Effects of Initially Misclassified Data on the Effectiveness of Discriminant Function Analysis and Finite Mixture Modeling
Classification procedures are common and useful in behavioral, educational, social, and managerial research. Supervised classification techniques such as discriminant function analysis assume training data are perfectly classified when estimating parameters or classifying. In contrast, unsupervised classification techniques such as finite mixture models (FMM) do not require, or even use if avai...
متن کاملRunning Head: MODELS AND STRATEGIES FOR FMA 1 Models and strategies for factor mixture analysis: Two examples concerning the structure underlying psychological disorders
The factor mixture model (FMM) uses a hybrid of both categorical and continuous latent variables. The FMM is a good model for the underlying structure of psychopathology because the use of both categorical and continuous latent variables allows the structure to be simultaneously categorical and dimensional. While the conceptualization of the FMM has been explained in the literature, the use of ...
متن کاملApplication of a finite mixture model to somatic cell scores of Italian goats.
The objectives of this study were to apply a finite mixture model (FMM) to data for somatic cell count in goats and to compare the fit of the FMM with that of a standard linear mixed effects model. Bacteriological information was used to assess the ability of the model to classify records from healthy or infected goats. Data were 4518 observations of somatic cell score (SCS) and bacterial infec...
متن کاملUsing Particle Swarm Optimization and Locally-Tuned General Regression Neural Networks with Optimal Completion for Clustering Incomplete Data Using Finite Mixture Models
In this paper, a new algorithm is presented for unsupervised learning of Finite Mixture Models using incomplete data set. This algorithm applies Particle Swarm Optimization to solve the local optima problem of the Expectation-Maximization algorithm. In addition, the proposed algorithm uses Locally-tuned General Regression neural networks with Optimal Completion Strategy to estimate missing valu...
متن کامل